HAPPY Team Entry to NIST OpenSAD Challenge: A Fusion of Short-Term Unsupervised and Segment i-Vector Based Speech Activity Detectors

نویسندگان

  • Tomi Kinnunen
  • Alexey Sholokhov
  • Elie el Khoury
  • Dennis Alexander Lehmann Thomsen
  • Md. Sahidullah
  • Zheng-Hua Tan
چکیده

Speech activity detection (SAD), the task of locating speech segments from a given recording, remains challenging under acoustically degraded conditions. In 2015, National Institute of Standards and Technology (NIST) coordinated OpenSAD bench-mark. We summarize “HAPPY” team effort to OpenSAD. SADs come in both unsupervised and supervised flavors, the latter requiring a labeled training set. Our solution fuses six base SADs (2 supervised and 4 unsupervised). The individually best SAD, in terms of detection cost function (DCF), is supervised and uses adaptive segmentation with i-vectors to represent the segments. Fusion of the six base SADs yields a relative decrease of 9.3 % in DCF over this SAD. Further, relative decrease of 17.4 % is obtained by incorporating channel detection side information.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation

In this paper, we present the SRI system submission to the NIST OpenSAD 2015 speech activity detection (SAD) evaluation. We present results on three different development databases that we created from the provided data. We present system-development results for feature normalization; for feature fusion with acoustic, voicing, and channel bottleneck features; and finally for SAD bottleneck-feat...

متن کامل

I-Vectors for Speech Activity Detection

I-Vectors are low dimensional front-end features known to effectively preserve the total variability of the signal. Motivated by their successful use for several classification problems such as speaker, language and face recognition, this paper introduces i-vectors for the task of speech activity detection (SAD). In contrast to most state-of-the-art SAD methods that operate at the frame or segm...

متن کامل

STC Speaker Recognition System for the NIST i-Vector Challenge

This paper presents a Speech Technology Center (STC) system submitted to the NIST i-vector Challenge. The system includes different subsystems based on PLDA, LDA-SVM, RBM-PLDA and DBN-PLDA. We propose an original iterative scheme for clustering the NIST i-vector Challenge devset. We also introduce the RBM-PLDA subsystem in the NIST i-vector Challenge. Experiments performed on the progress datas...

متن کامل

RBM-PLDA subsystem for the NIST i-vector challenge

This paper presents the Speech Technology Center (STC) system submitted to NIST i-vector challenge. The system includes different subsystems based on TV-PLDA, TV-SVM, and RBM-PLDA. In this paper we focus on examining the third RBM-PLDA subsystem. Within this subsystem, we present our RBM extractor of the pseudo i-vector. Experiments performed on the test dataset of NIST-2014 demonstrate that al...

متن کامل

Investigating State-of-the-Art Speaker Verification in the case of Unlabeled Development Data

In this study, we describe the systems developed by the Center for Robust Speech Systems (CRSS), Univ. of Texas Dallas, for the NIST i-vector challenge. Given the emphasis of this challenge is on utilizing unlabeled development data, our system development focuses on: 1) leveraging the channel variation from unlabeled development data through unsupervised clustering; 2) investigating different ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016